A Reproducible Health Informatics Pipeline for Simulating and Integrating Early-Phase Oncology Clinical, Biomarker, and Pharmacokinetic Data for Exploratory Decision-Support Analytics

This paper presents a reproducible Python-based workflow that simulates and integrates early-phase oncology clinical, biomarker, and pharmacokinetic data to generate analysis-ready datasets, visualizations, and exploratory predictive models for translational decision support.

Petalcorin, M. I. R.2026-04-02📄 health informatics

Counterfactual prediction of treatment effects on irregular clinical data using Time-Aware G-Transformers

The paper introduces the Time-Aware G-Transformer, a deep learning model that combines causal G-computation with time-aware attention to accurately predict counterfactual treatment effects on irregular, heterogeneous clinical data, outperforming existing methods in long-horizon forecasting and uncertainty calibration for personalized medicine.

Hornak, G., Heinolainen, A., Solyomvari, K. + 3 more2026-04-02📄 health informatics

Governance, Accountability and Post-Deployment Monitoring Preferences for AI Integration in West African Clinical Practice: A Mixed-Methods Study

This mixed-methods study of West African clinicians and technical experts reveals a strong preference for independent regulatory oversight, transparent algorithms, and clear accountability frameworks to ensure safe and equitable AI integration in clinical practice, while highlighting significant concerns regarding vendor control and potential unfair liability for medical errors.

Uzochukwu, B. S. C., Cherima, Y. J., Enebeli, U. U. + 8 more2026-04-01📄 health informatics

Self-Reported Symptoms Enable Four-Phase Menstrual Cycle Classification with Hormonally Validated Labels

This study demonstrates that a hybrid machine learning framework combining gradient boosting and Hidden Semi-Markov Models can accurately classify four menstrual cycle phases using only self-reported symptoms, achieving 67.6% accuracy and establishing symptom dynamics as a scalable, device-free digital biomarker for reproductive health.

Specht, B., Tayeb, Z. Z., Garbaya, S. + 3 more2026-04-01📄 health informatics

MedScope: A Lightweight Benchmark of Open-Source Large Language Models for Medical Question Answering

This paper introduces MedScope, a lightweight benchmarking framework that systematically evaluates six open-source large language models on medical multiple-choice questions using multi-dimensional metrics and visual analyses, revealing significant performance heterogeneity and highlighting their current unsuitability for unsupervised high-risk clinical deployment despite their value as transparent baselines.

Bian, R., Cheng, W.2026-04-01📄 health informatics

Data sharing policies, requirements, and support from public and private clinical trial sponsors: a survey on top sponsors of clinical trials in Europe

This survey of the top 40 public and private clinical trial sponsors in Europe reveals a significant sectoral imbalance in data sharing governance, where private sponsors generally offer more detailed and accessible operational documentation compared to public sponsors, who tend to provide high-level commitments lacking trial-specific guidance despite shared adherence to GDPR requirements.

Tai, K. H., Varvara, G., Escoffier, E. + 4 more2026-04-01📄 health informatics

Combining Token Classification With Large Language Model Revision for Age-Friendly 4M Entity Recognition From Nursing Home Text Messages: Development and Evaluation Study

This study presents and evaluates a multi-stage pipeline that combines a fine-tuned Bio-ClinicalBERT token classifier with locally deployed open-source large language models for revision, demonstrating that this hybrid approach significantly improves the accuracy and efficiency of extracting structured Age-Friendly 4M (What Matters, Medication, Mentation, and Mobility) information from informal nursing home text messages compared to single-stage models.

Amewudah, P., Popescu, M., Farmer, M. S. + 1 more2026-04-01📄 health informatics

Longitudinal information extraction from clinical notes in rare diseases: an efficient approach with small language models

This study demonstrates that small language models (SLMs) can effectively and efficiently extract longitudinal serum creatinine data from unstructured French clinical notes for rare kidney diseases, offering a privacy-preserving and resource-efficient alternative to traditional methods despite challenges with text duplication and language nuances.

Wang, X., Faviez, C., Vincent, M. + 8 more2026-03-31📄 health informatics

MedResearchBench: A Multi-Domain Benchmark for Evaluating AI Research Agents on Clinical Medical Research

MedResearchBench introduces the first multi-domain benchmark specifically designed to evaluate AI research agents on clinical medical tasks by leveraging public datasets and high-quality ground truth to assess performance across seven clinical domains using six medical-specific dimensions, thereby addressing the critical gap in evaluating AI's ability to conduct publication-quality, clinically sound research.

Tan, S., Tian, Z.2026-03-31📄 health informatics

BSO-AD: An Ontology for Representing and Harmonizing Behavioral Social Knowledge in ADRD

The authors developed BSOAD, the first ontology to systematically represent and harmonize behavioral and social factors influencing Alzheimer's disease and related dementias by integrating existing ontologies and literature-derived relationships, while validating its structure and coverage through expert review and an LLM-assisted evaluation framework.

Li, H., Yu, Y., Bhandarkar, A. + 7 more2026-03-31📄 health informatics

VaaS is a Multi-Layer Hallucination Reduction Pipeline for AI-Assisted Science: Production Validation and Prospective Benchmarking

This paper introduces and validates VaaS, a multi-layer, cost-effective pipeline that reduces AI hallucinations in scientific citation generation to near-zero levels through iterative refinement and rigorous benchmarking, thereby enabling reliable AI-assisted biomedical research at production scale.

Sabharwal, A., Patel, M. S., Carrano, A. + 3 more2026-03-30📄 health informatics

Learning Patient-Specific Event Sequence Representations for Clinical Process Analysis

This paper introduces ClinicalTAAT, a time-aware transformer model that learns interpretable, patient-specific representations from sparse and irregular clinical event sequences to outperform existing methods in acuity classification, subgroup identification, and anomaly detection, thereby offering a scalable framework for data-driven healthcare process analysis.

Solyomvari, K., Antikainen, T., Moen, H. + 3 more2026-03-30📄 health informatics

Measuring the Unmeasurable: A Diagnostic Sensor for AI Reasoning Pathology in Sequential Clinical Decision-Making

This study introduces a diagnostic framework and the Sequential Information Prioritization Scaffold (SIPS) to reveal and quantify a critical "Access-Stability Dissociation" in LLMs during sequential clinical reasoning, demonstrating that while structured scaffolding eliminates pathological convergence failures by making reasoning auditable, it exposes a model-specific trade-off between hypothesis retention and final diagnostic accuracy.

Wang, S.2026-03-30📄 health informatics

HealthFormer: Dual-level time-aware Transformers for irregular electronic health record events

The paper proposes HealthFormer, a dual-level, time-aware Transformer framework pretrained on large-scale longitudinal EHRs using multi-task self-supervision to learn hierarchical, event-centric patient representations that achieve state-of-the-art performance in incident cancer prediction through straightforward fine-tuning.

Körösi-Szabo, P., Kovacs, G., Csiszarik, A. + 4 more2026-03-27📄 health informatics